Creating a Multilingual Data Collection for Bilingual Lexicography from Parallel Monolingual Lexicons
نویسنده
چکیده
In the DELIS project 1 , a set of parallel monolingual lexicon fragments have been designed. They are parallel in two ways: (1) they cover the same fragment (the most general verbs of sensory perception and of speech act), and (2) they are based on the same theoretical approaches and on comparable classiications and descriptive devices. It is claimed in this paper that such parallel fragments, formalized and represented in a modular and access-neutral way, constitute a lexical data collection which can serve as a pre-dictionary fact base from where bilingual dictionaries can be derived. We discuss examples of the procedures by which raw material for bilingual dictionaries can be derived from the fact base (semi-)automatically. 1. Metalexicographic introduction: monolingual and bilingual dictionaries { the role of a pre-dictionary fact base In metalexicography, there has been some discussion about directional as opposed to non-directional bilingual dictionaries. Directional dictionaries { as advocated and illustrated by Kromann 1989] and Kromann/Riiber/Rosbach 1989] { aim at eeciency of presentation, taking the users' perspective and the users' mother tongue as a starting point. For an \active" translation dictionary of the directional type (from mother tongue to the users' \foreign" language), the main objective is to make those cases clear where the target language diiers considerably from the source language. If the target language lexical items display the same properties as the source language items, the lexicographer can leave their description (partly) underspeciied, to save space in the article; it is not felt necessary, for example, to describe reading distinctions which are parallel in both languages. The non-directional dictionary 2 aims at explicitness more than at economy of space: ideally, the relevant distinctions of the source language are made explicit in any case: even if they happen to exist in a parallel fashion in the target language 3. The non-directional approach seems to more readily support the \reuse" of lexical descriptions: the experiments carried out on the Van Dale dictionaries by Al 1988], Heid 1990] and Martin/van der Vliet 1992] demonstrated that non-directional lexical descriptions can quite easily be reused 4. The diierence between directional and non-directional dictionaries can thus be paraphrased (if we allow ourselves some simpliication) as a trade-oo between eeciency and explicitness. It thus has 1 DELIS stands for \Descriptive Lexical Speciications and tools for corpus-based lexicon building". DELIS (Febru-ary 1993 through December 1995) is a shared-cost project partly funded by the DG …
منابع مشابه
Multilingual Linguistic Resources: From Monolingual Lexicons to Bilingual Interrelated Lexicons
This paper describes a procedure to convert the PAROLE-SIMPLE monolingual lexicons into bilingual interrelated lexicons where each word sense of a given language is linked to the pertinent sense of the right words in one or more target lexicons. Nowadays, SIMPLE lexicons are monolingual although the ultimate goal of these harmonised monolingual lexicons is to build multilingual lexical resource...
متن کاملDeveloping Parallel Sense-tagged Corpora with Wordnets
Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...
متن کاملEXETER at CLEF 2003: Experiments with Machine Translation for Monolingual, Bilingual and Multilingual Retrieval
The University of Exeter group participated in the monolingual, bilingual and multilingual-4 retrieval tasks this year. The main focus of our investigation this year was the small multilingual task comprising four languages, French, German, Spanish and English. We adopted a document translation strategy and tested four merging techniques to combine results from the separate document collections...
متن کاملUniversity of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task
This paper describes the work done at the University of Hagen for our participation at the German Indexing and Retrieval Test (GIRT) task of the CLEF 2004 evaluation campaign. We conducted both monolingual and bilingual information retrieval experiments. For monolingual experiments with the German document collection, the focus is on applying and comparing three indexing methods targeting full ...
متن کاملComputational bilingual lexicography: automatic extraction of translation dictionaries
The paper describes a simple but very effective approach to extraction translation equivalents from parallel corpora. We briefly present the multilingual parallel corpus used in our experiments and then describe the pre-processing steps, a baseline iterative method, and the actual algorithm. The evaluation for the two algorithms is presented in some details in terms of precision, recall and pro...
متن کامل